Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex.insert doesnt preserve name and tz #7299

Merged
merged 1 commit into from
Jun 6, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 31, 2014

DatetimeIndex.insert doesn't preserve name and tz attributes. Also modified DatetimeIndex.asobject to return an object Index which has the same name as original to cover the case when the result is being object Index.

# normal Index preserves its name after insertion
idx = pd.Index([1, 2, 3], name='normal')
inserted = idx.insert(0, 1)
inserted.name
# normal

# But DatetimeIndex doesn't
dtidx = pd.DatetimeIndex([datetime.datetime(2011, 1, 3), datetime.datetime(2011, 1, 4)], freq='M', name='dtidx')
inserted = dtidx.insert(0, datetime.datetime(2011, 1, 5))
inserted.name
# None

@jreback jreback added this to the 0.14.1 milestone Jun 1, 2014
@jreback
Copy link
Contributor

jreback commented Jun 1, 2014

did you test if adding a new date to a time-series with a given frequency then changes to None if the inferred different? (e.g. adding a 2nd day of the month to a MS or something). note that I think this check should be done before passing to the constructor as its more efficient (no need to really reinfer the freq). - note that this is NOT true for deleting (and already merged that)

@sinhrks
Copy link
Member Author

sinhrks commented Jun 2, 2014

@jreback Added a logic to check the item is on freq before inserting. Is it correct that inserting '2011-01-07' to Daily frequency index [2011-01-01, 2011-02, 2011-03'] is still Daily frequency? (Frequency is valid if all the values are on the offset, and unrelated to sorted / continuous).

And should set inferred_freq in case of the result's freq is None?

@jreback
Copy link
Contributor

jreback commented Jun 2, 2014

I am pretty sure that insert will almost always reset the freq to None (I think it reinfers inside the constructor).

just want to tests (and you example is not True, freq is None)

In [4]: i = pd.date_range('20130101',periods=3).insert(3,Timestamp('20130107'))

In [5]: i
Out[5]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01, ..., 2013-01-07]
Length: 4, Freq: None, Timezone: None

In [6]: pd.infer_freq(i)

@immerrr
Copy link
Contributor

immerrr commented Jun 2, 2014

Sorry, for barging in, I could've sworn I've seen code like this elsewhere. I've finally remembered: it was the subclassing guide. in numpy docs.

As long as Index is a subclass of ndarray (btw, are there plans to decouple Index from ndarray?), it seems a better way to overload __array_wrap__ to transfer metadata to new objects, otherwise you're going to need similar workarounds for a lot of functions in numpy.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2014

hmm, @immerrr that is right....a better way (I thought that was already done....guess not)

yes, in theory plans to decouple Index from ndarray, but @jtratner not really actively working on this. Its not that hard actually. interested?

@sinhrks
Copy link
Member Author

sinhrks commented Jun 2, 2014

OK. Based on inferred_freq, I understand that values are all on the offset, continuous and sorted to be the specific frequency.

Because timezone is corrupted if values are directly passed to DatetimeIndex.__init__, I use _simple_new and set inferred_freq separately (like current union does). I'll check __array_wrap__ if it makes better implementation.

didx = pd.DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# M

didx = pd.DatetimeIndex(['2000-01-31', '2000-03-31', '2000-02-29', '2000-04-30'])
didx.inferred_freq
# None

didx = pd.DatetimeIndex(['2000-01-31', '2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# None

didx = pd.DatetimeIndex(['2000-01-31', '2000-03-31', '2000-04-30'])
didx.inferred_freq
# None

#7302 should also be fixed because delete may result in intermittent values?
Fortunately I think my other previous PR looks not affected...

@immerrr
Copy link
Contributor

immerrr commented Jun 2, 2014

@jreback I'm now in a rather deadline-ish mode just skimming through mail in spare time. Once I'm done with that, I could take a stab at it.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2014

@sinhrks yep I think that is correct; you basically DON't want to pass in a freq when you insert and let it reinfer (what I was saying is a) test this b) you can prob do a quick check if the inserted matches the same freq and would be faster so you CAN pass in a new freq).

also, I think delete should work the same (so need to fix that too).

maybe a function _maybe_infer or soething which you pass the kwargs and the new array

@rosnfeld
Copy link
Contributor

rosnfeld commented Jun 2, 2014

Edit: moving my comment to #7302

@sinhrks
Copy link
Member Author

sinhrks commented Jun 3, 2014

@jreback. Thanks. But I think checking whether the constructed DTI meets the freq in advance is little difficult in some cases. For example:

# If target is on the edge, easy to check
pd.DatetimeIndex(['2011-01-01, '2011-01-02', '2011-01-03', '2011-01-04']).insert(4, pd.Timestamp('2011-01-05'))

# But following is difficult, calling infer_freq after construction is easier.
pd.DatetimeIndex(['2011-01-01, '2011-01-03', '2011-01-04']).insert(1, pd.Timestamp('2011-01-02'))

I've modified to call infer_freq after index construction, but is it different from your idea?

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

I think the constructor should infer it (I just don't remember exactly how/when this happens), and in fact you may not want to infer it (e.g. let it happen automatically if necessary). Which I think is the prior behavior. So maybe don't don't explicity infer at all (i was just thinking it would be faster/easier to do that).

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

@sinhrks I was thinking that checking the edge cases only for both insert/delete makes sense (for delete its easy as it would be the same freq), insert if the new stamp obeys the current freq. otherwise just pass thru freq=None which doesn't automatically infer (i think), also see @rosnfeld comment #7302 (which I think is related to the automatic inferring)

@sinhrks
Copy link
Member Author

sinhrks commented Jun 3, 2014

@jreback OK. I've modified, and hope to meet what you saying. I feel the check is very specific logic and not split to a separate function for now.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

that looks right

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

@rosnfeld can you pull in this PR and see if you are talking about in #7302 is changed / same

@sinhrks
Copy link
Member Author

sinhrks commented Jun 3, 2014

Thanks. I understand @rosnfeld is pointing the difference caused by #7302 (Index.delete), not included in this PR (Index.insert). I've prepared the same fix for #7302 and issue separate PR.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

ok, that's fine (but its actually the same issue),

@rosnfeld
Copy link
Contributor

rosnfeld commented Jun 3, 2014

I don't see the code for this pull request changing anything about the delete behavior from what was already done in #7302, but #7320 breaks things, as noted on that PR. I can continue the conversation here or there, whichever you prefer.

@jreback
Copy link
Contributor

jreback commented Jun 5, 2014

@sinhrks pls fix this up similarly to .delete and I think good 2 go

@sinhrks
Copy link
Member Author

sinhrks commented Jun 6, 2014

@jreback Because .insert accepts only a single value, current logic should work.

numpy.insert can accepts list of indices/values as of v1.8, it is an option to change the API. To do this, normal Index should also be fixed.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.insert.html

jreback added a commit that referenced this pull request Jun 6, 2014
BUG: DatetimeIndex.insert doesnt preserve name and tz
@jreback jreback merged commit 64d2160 into pandas-dev:master Jun 6, 2014
@jreback
Copy link
Contributor

jreback commented Jun 6, 2014

this is fine for now...in the future could extend it; maybe add an issue for 0.15?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants